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ABSTRACT 

[^Random sampling is an essential tool in the processing and transmission of 
data. It is used to summarize data too large to store or manipulate and meet 
resource constraints on bandwidth or battery power. Estimators that are 
applied to the sample facilitate fast approximate processing of queries posed 
over the original data and the value of the sample hinges on the quality of 
these estimators. 

Our work targets data sets such as request and traffic logs and sensor 
measurements, where data is repeatedly collected over multiple instances: 
time periods, locations, or snapshots. We are interested in queries that span 
multiple instances, such as distinct counts and distance measures over se- 
lected records. These queries are used for applications ranging from plan- 
ning to anomaly and change detection. 

Unbiased low-variance estimators are particularly effective as the rela- 
tive error decreases with the number of selected record keys. The Horvitz- 
Thompson estimator, known to minimize variance for sampling with "all or 
nothing" outcomes (which reveals exacts value or no information on esti- 
mated quantity), is not optimal for multi-instance operations for which an 
outcome may provide partial information. 

We present a general principled methodology for the derivation of (Pareto) 
optimal unbiased estimators over sampled instances and aim to understand 
its potential. We demonstrate significant improvement in estimate accuracy 
of fundamental queries for common sampling schemes. 

1. INTRODUCTION 

Random sampling had become an essential tool in the handling 
of data. It is used to accommodate resource constraints on storage, 
bandwidth, energy, and processing power. Massive data sets can 
be too large to be stored long term or transmitted, sensor nodes 
collecting measurements are energy limited, and even when the full 
data is available, computation of exact aggregates may be slow and 
costly. 

The sample constitutes a summary of the original data sets that is 
small enough to store, transmit, and manipulate in a single location 
and yet supports computation of approximate queries over the orig- 
inal data. It is flexible in that many types of queries are supported 
and that queries need not be known a priori |31||39[ [5||4l[9l |25||26[ 
|2l[2T][2l|27l[l3l|22l[l0l[T4). 

Commonly, data has the form of multiple instances which are 
dispersed in time or location. Each instance corresponds to an as- 
signment of values to a set of identifiers (keys). The universe of key 
values is shared between instances but the values change. This data 
can be modeled as a numeric matrix of instances x keys. Instances 
can be snapshots of a database that is modified over time, measure- 
ments from sensors or of parameters taken in different time periods, 
or number of requests for resources processed at multiple servers. 
Clearly, any scalable summarization algorithm of dispersed data 
must decouple the processing of different instances: the processing 
of one instance must not depend on values in other instances. 

'This is a full version of 1 151. 



An important class of query primitives are functions with argu- 
ments that span values assumed by a key in multiple instances, such 
as quantiles (maximum, minimum, median) or the range (differ- 
ence between maximum and minimum). Sum aggregates of these 
primitives over selected subsets of keys [32''¥','17] include distinct 
element count (size of union), max-dominance and min-dominance 
norms 1 19, 20] and the Manhattan (Li) distance and are used for 
change or anomaly detection, similarity-based clustering, monitor- 
ing, and planning. See example in Figure|5](A). 

Popular sampling scheme of a single instance are Poisson ~ where 
keys are sampled independently, bottom-fc (order) [36, 12, 22, 13] 
^1 - where keys are assigned random rank values and the k small- 
est ranked keys are selected (as in weighted sampling without re- 
placement and priority sampling), and VarOpt (10[[6) . 

The Horvitz Thompson (HT) estimator |29l, based on inverse- 
probability weights, is a classic method for estimating subset-sums 
of values of keys: The estimate on the value of a key is if it 
is not included in the sample and the ratio of its true value and 
the inclusion probability otherwise. The estimate on the sum of 
values of a subset of keys is the sum of estimates over sampled 
keys that are members of the subset. This estimator is unbiased and 
has minimum variance amongst unbiased nonnegative estimators. 
A variant of HT is used for bottom-A: sampling | 22, 38, 17]. 

Previous estimators we are aware of for multi-instance functions 
are based on an adaptation of HT: a positive estimate is provided 
only on samples that revealed sufficient information to compute 
the exact value of the estimated quantity. We observe that such 
estimators may not be optimal for multi-instance functions, where 
outcomes can provide partial information on the estimated value. 
We aim to understand the form and potential performance gain of 
better estimators. 

Contribution: We characterize the joint sample distributions at- 
tainable for dispersed instances, that is, when processing of each 
instance may not depend on values of another. 

Our main contribution is a principled methodology for deriving 
optimal estimators for multi-instance functions, taking the sam- 
pling scheme as a given. The sample of each instance can be 
Poisson, VarOpt, or bottom-fc. Sampling can be weighted (in- 
clusion probability in the sample depends on the value) or weight- 
oblivious. The joint distribution (samples of different instances) 
can be independent or coordinated. Coordination, achieved using 
random hash function, means that similar instances get similar sam- 
ples j3l|33|g|36]|5] |g|g|25]|26||2] |T3]|27|[^^ can boost 
estimation quality of multi-instance fimctions |17[|18| . 

We provide example derivations of optimal estimators for basic 
aggregations over common sampling distributions and demonstrate 
significant gain, in terms of lower variance, over state-of-the-art 
estimators. Optimality is in a Pareto sense with respect to variance: 
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any other nonnegative estimator with lower variance on some data 
must have higher variance on some other data. 

A key component in attaining optimality is the use of partial in- 
formation, which we motivate by the following simple scenario. 
Consider estimating the maximum of two values, vi and V2, sam- 
pled independently with respective probabilities pi and p2. We 
can be certain about the value max(ui, ^2) only when both values 
are sampled, which happens with probability piP2- The inverse- 
probability weight is max{vi,V2)/ {P1P2) when both values are 
sampled and otherwise and is an unbiased estimate. We now 
observe that when exactly one of the values is sampled, we know 
that the maximum is at least that value, that is, we have meaning- 
ful partial information in the form of a positive lower bound on the 
maximum. We will show how to exploit that and obtain a nonneg- 
ative and unbiased estimator with lower variance than the inverse- 
probability weight. 

We distinguish between independent weighted sampling schemes, 
according to the "reproducibility" of the randomization used: with 
Known (unknown) seeds the random hash functions used in sam- 
pling each instance are (are not) available to the estimator. We 
show that knowledge of seeds substantially increases estimation 
power: we provide nonnegative unbiased estimators for the max- 
imum when seeds are known and show that when seeds are un- 
known, there is no such estimator even when there are only two 
values and the domain is Boolean (in which case the maximum is 
OR of two bits). Our negative result for unknown seeds agrees with 
prior work that (implicitly) assume "unknown seeds," such as j7], 
who showed that most of the data needs to be sampled in order to 
obtain with constant probability small error estimate of distinct ele- 
ment count (which is a sum aggregate of OR). "Known seeds" sam- 
pling, however, can be easily incorporated when streaming or oth- 
erwise processing the full data set. We demonstrate its benefit when 
independent weighted samples of instances might be used post hoc 
for estimates of multi-instance queries. While reproducible ran- 
domization was extensively used as a means to coordinate samples, 
we believe that its potential to enhance the usefulness of indepen- 
dent weighted samples was not previously properly understood. 

Overview: Section|2]characterizes all sample distributions that are 
consistent with the constraints on summarization of dispersed val- 
ues. In Section[3]we propose methods to obtain optimal estimators 
which we apply in Sections |4|5| in example derivations. In Sec- 
tion[4]we consider weight-oblivious Poisson sampling of keys and 
independent sampling of instances and derive two Pareto optimal 
estimators for the maximum, one catering for data where values of 
a key are similar across instances and one where variation is large. 
Weighted sampling (with known seeds) is studied in Section|5]and 
we derive optimal estimators for the maximum and Boolean OR 
over two instances. Section |6] contains negative results for inde- 
pendently sampled instances with unknown seeds: We show that 
there are no unbiased nonnegative estimators for maximum and for 
absolute difference, even when data is binary. 

In terms of an instances x keys data matrix. Sections |2jj6] con- 
sider functions over the values v — {vi, . . . ,Vr) of a single key 
(i.e., column) in r dispersed instances. To estimate sum aggregates 
over multiple selected keys, we sum individual estimates for the 
selected keys. For example, to estimate distinct element count, we 
apply an OR estimator for each key and sum these estimates. Sec- 
tion |7] overviews the application of single-key estimators to sum 
aggregates. Applications to distinct count and max dominance are 
provided in Section[8] 

2. SAMPLING DISPERSED VALUES 



The data is represented by a vector v — {vi , . . . ,Vr) G V where 

V C Vi X • ■ • X K- and we are interested in the value of a function 
/(v). Examples include the value Vi of the ith entry, the ^th largest 
entry f*'^(v), the maximum max(v) = maxig[r] Vi, the minimum 
min(v) = minig[r] «i, the range RG(v) = max(v) — min(v), and 
exponentiated range RGd(v) = RG(v)'^ for d > 0. The domain V 
can be the nonnegative quadrant of R'^ or {0, 1}' . 

For a subset V' C V of data vectors we define /(V^') — inf{/(u) j 

V G V'} nndJ{V') = sup{f{v) \ v G l^'}, the lowest and highest 
values of f onV' . 

We see a random sample S C [r] of the entries of v. The sample 
distribution is subject to the constraint that the inclusion of i in S 
is independent of the values Vj for j ^ i. This is formalized as 
follows: There is a probability distribution T over a sample space 
Q of predicates cr = (cti, . . . , Ur), where ai has domain Vi. The 
sample S = S{it, v) is a function of the predicate vector a and 
the data vector v and includes i G [r] if and only if ai{vi) is true: 
i e S ^ ai{vi). 

Two special cases are: 

• Weight-oblivious sampling, where inclusion of i in S is indepen- 
dent of Vi. The predicates at are constants (0 or 1) and entry i 
is sampled if and only if ai — 1 (which happens with probability 

= E[(Ji]). 

• Weighted sampling where inclusion probability of each i is non- 
decreasing with Vi (in particular, Vi = i ^ S). 

Weighted sampling is important when the sample is used to esti- 
mate functions that increase with the data values. The predicates ai 
are increasing functions that can be specified in terms of a transition 
threshold value : 

i £ S ai{vi) Vi>Ti. 

We find it convenient to specify weighted sampling distributions 
using non-decreasing functions Ti : [0, 1], i G [r] and a random 
seed vector u G [0, 1]*^ so that Ui G [0, 1] is uniformly distributed, 
with the interpretation that 

i £ S Vi > Ti{ui) . 

The inclusion probability of i is PR[vi > Ti{ui)] = supjit G 

[0,1] I V^ > T,{u)}. 

Weighted sampling is PPS (Probability Proportional to Size) when 
Ti — UiT* , where r* is a fixed vector. With PPS sampling, i is 
sampled with probability min{l, Vi/r*}. 

Independent (Poisson) sampling is when entries are sampled in- 
dependently, that is, the seeds Ui are independent. In the general 
model, 7" is a product distribution and ai is independent of all aj 
for j / i. 

Shared-seed (coordinated) sampling is when the entries of the 
seed vector are identical: ui = • • • = u,. = m where u G [0, 1] is 
selected uniformly at random. 

2.1 Estimators 

An estimator f(S) of /(v) is a function applied to the outcome 
S (sampled entries and their values). The estimator depends on 
the domain V and distribution T. When sampling is weighted, we 
distinguish between two models, depending whether the seeds (the 
random predicate vector cr in the general model or the seed vector 
u) are available to the estimator. From the seed we can reveal in- 
formation on values of entries that are not included in the sample: 
If i ^ S, we know that Vi < Ti{ui) (vi G a~^{0) in the general 
model). 

With an outcome S, we associate a set V*{S) C V of all 
data vectors consistent with this outcome. In the discrete case. 
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V{S) = {v G V 1 PRfS 1 v] > 0}. Otherwise, V*{S) con- 
tains all vectors for which the probability density for the outcome 
is positive. When seeds are not known, V* (S) contains v if and 
only if the probability density of S{(t, v) (where cr £ Q.) is pos- 
itive for our outcome S. With known seeds, cr is available, and 
hence -v £ V* (S) if and only if the outcome matches S{(t, v). 
We seek estimators with some or all of the following properties: 

unbiased: for all v, E[f \ v] = /(v). 
nonnegative: / > 0. 

bounded variance: Vv, VAR[/ | v] < oo. 

dominance: We say that an estimator /'^' dominates f^^^ if for all 
data vectors v, VAR[/*^^ | v] < VAR[/'^' | v]. An estimator / is 
dominant (Pareto optimal) if there is no other unbiased nonnegative 
estimator /' that dominates /. 

monotone: Nonnegative and non-decreasing with information. If 

V'{S) C V*{S'), then f{S) > f{S'). 

Unbiasedness is particularly desirable when estimating sums by 
summing individual estimates: When unbiased and independent (or 
non-positively correlated) estimates are combined, the relative er- 
ror decreases. Nonnegativity is desirable when estimating a non- 
negative function / > 0, ensuring an estimate from the same do- 
main as the estimated quantity. If there is an estimator that domi- 
nates all others, it is the only optimal one. If there isn't, we instead 
aim for Pareto optimality. Monotonicity is an intuitive smoothness 
requirement. 

2.2 Horvitz Thompson estimator 

Suppose we are interested in estimating a function f(v) > 
under "all or nothing" sampling, where either the value is sampled 
and V is known precisely or it is not sampled and we know nothing 
about f{v). When the value is sampled, from the value v and the 
sample distribution we can compute the probability p that the value 
is sampled. 

The HT estimator (29) off{v) applies inverse probability 

weighting: / = if the entry is not sampled and / — f{v)/p if 
it is sampled. This estimator is clearly nonnegative, monotone, and 
unbiased: E[/] = (1 — p) * + p^^^ = f{v) . The variance is 

VAR[/]=/(l,)2Q-lj . (1) 

The HT estimator is optimal in that VAR[/] is minimized for all 
V over all unbiased nonnegative estimators. Intuitively, this is be- 
cause an unbiased nonnegative estimator can not be positive (with 
nonzero probability) on outcomes that are consistent with f{v) — 
and variance is minimized when using equal estimate when sam- 
pled. 

Multi-entry /. The application of inverse-probability weights on 
multi-entry functions is more delicate. We can use the set of out- 
comes for which S = [r], that is all entries are sampled. For 
these outcomes we know the data v and from T we can determine 
PR[S' = [r] I v]. The estimator is /(v)/pr[S' = [r] | v] if 5" = [r] 
and otherwise. This estimator is defined when PR[5 = [r] \ v] > 
0. With weighted sampling, however, "0" valued entries are never 
sampled, so we may have PR[S = [r]] — when /(v) > 0. 

A broader definition of inverse-probability estimators (I7[|I8| is 
with respect to a subset 5* of all possible outcomes (over Q and 
V). The outcomes S* are those on which the estimator is positive. 
The estimator is defined for 5* if there exist two functions /* and 
p* with domain 5* that satisfy the following: 



• for any outcome Se 5*, for alive V*{S),f{v) ^ f*{S) 
andPR[5* | v] = p*{S). 

• for all V G V with f(v) > 0, PR[5* 1 v] > 0. 

The estimate is f{S) = if S ^ 5* and f{S) = r{S)/p*{S) 
otherwise. These functions and hence the estimator are unique for 
5* if they exist. When 5* is more inclusive, the respective estima- 
tor has lower (or same) variance on all data. We use the notation 
j(*fr) jr^j. jj^g estimator corresponding to the most inclusive 5*. A 
sufficient condition for optimality of /'^^' is that for all outcomes 
S ^ S*,l{V*{S)) = 0. 

2.3 Necessary conditions for estimation 

Inverse-probability estimators are unbiased, nonnegative (when 
/ is) , and monotone. At most two different estimate values (zero 
and possibly a positive value) are possible for a given data vector 
and thus, variance is bounded. An inverse-probability estimator, 
however, exists only if for all data such that /(v) > 0, there is 
positive probability of recovering /(v) from the outcome. This re- 
quirement excludes basic functions such as RG over weighted sam- 
ples: When the data has at least one positive and one zero entry, 
there is zero probability of recovering the exact value of RG(v) 
from the outcome. A nonnegative, unbiased, and bounded-variance 
RG estimator, however, was presented in 1 17, I8|. 

Aiming for a broader understanding of when an estimator with 
these properties exists, we derive some necessary conditions. For 
a set of outcomes, determined by a portion f2' C of the sample 
space and data vector v, we define 

y*(f7',v)= fl V*(S(a,v)) 
(Ten' 

the set of all vectors that are consistent with all outcomes deter- 
mined by SI' and v. 

For V and e, we define A(v, e) = 1 if V(t, f{S{(T, v)) > /(v) — e 
and 

A(v,e) = 1-sup jpRp'] \n' CQ, l{V*{n\\f)) < /(v)-e| (2) 
Otherwise. 

That is, we look for Q' of maximum size such that if we consider 
all vectors v' G V* {Q' , v) that are consistent with v on Q,', the in- 
fimum of / over V*{Q' ,y) is at most /(v) — e. We define A(v, e) 
as the probability PR[r2 \ fi'] of not being in that portion. 

Lemma 2.1. A function f has an estimator that is 

• unbiased and nonnegative =>.• 

Vv,Ve > 0,A(v,e) > (3) 

• unbiased, nonnegative, and bounded variance =^.' 

Vv, A(v,e) =n(e2) . (4) 

• unbiased, nonnegative, and bounded =>.' 

Vv, A(v,6) = . (5) 

Proof. The contribution of SI' to the expectation of / must not 
exceed f{V*{Q', v)). Because if it does, then / must assume neg- 
ative values for v' G V*{Q','v) with minimum /(v'). Considering 
a maximum SI' with f{V*{^l', v)) < /(v) — e, its contribution 
to the expectation is at most /(v) — e and the contribution of the 
complement, which has probability A(v, e), must be at least e. 
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If A(v, e) = then this is not possible, so ijs} follows. The ex- 
pectation of the estimator over the complement is at least ^(^-jy, 
thus |5| is necessary. The contribution to the variance of that com- 
plement is at least 



Algorithm 1 / 



(-<) 



A(v,, 



.A(v,e) 
which implies (|4| is necessary. □ 



3. PARETO OPTIMAL ESTIMATORS 

We formulate sufficient conditions for Pareto optimality, which 
form the basis of our estimator derivations. 

We start by seeking Pareto optimal estimators defined with re- 
spect to an order -< over the set V of all possible data vectors and 
minimizing variance in an order-respecting way: The variance of 
the estimator for a data vector v is minimized conditioned on val- 
ues it assigned to outcomes consistent with vectors that precede 
V. This setup naturally yields estimators that are Pareto optimal. 
Moreover, by selecting an order -< so that more likely vectors ap- 
pear earlier, we can tailor the estimator according to properties of 
the data. 

Order-based optimality / ' ^ ' : The first estimator we present, / ' ^ ' , 
is the solution of a simple set of equations. A solution may not ex- 
ist, but when it does, it is unique and Pareto optimal. 

We map an outcome S to its -< -minimal consistent data vector 
0(5') = min^ V*{S) (we assume it is well defined). We say that 
S is determined by the data vector v = cf>[S) and that v is the de- 
termining vector of S. An outcome S precedes v if it is determined 
by z ^ V. 

For continuous spaces V and 5, we extend some assumptions 
on the mapping (f) from the discrete case: (i) For all v, ^^^(v) is 
either empty or has positive probability, (ii) any subset of 0~^(v) 
with zero probability for data v also has zero probability for data 
z ;^ V, and (iii) any positive-probability set of outcomes consistent 
with V and determined by preceding vectors must include a positive 
probability subset of (^~^(z) for some z ^ v. 

For each vector v, J*-^' has the same value on all outcomes 
S' = 4>~^{\') determined by v. Slightly abusing notation, we de- 
fine /(-^' (v) = /'^) (S) for S G 5' to be that value. 

We express /'^'(v) as a function of /'^' on the outcomes So 
that precede v. The dependence on preceding outcomes So is 
through their contribution /o to the expectation of the estimate of 
/(v). The estimate value /'^'(v) is as follows: If PR[5'|v] = 



and/(v) = /o, .f^'(v 
we declare failure. Else, 



0. If PR[5'|v] = and /(v) / /o. 



f^^(v) 



/(v) - /o 
PR[5'iv] 



(6) 



From the inverse-probability weights principle, this choice of f{S) 
for S £ S' minimizes the variance VAR[/|v] for data vector v 
conditioned on the values f : So- 

When the order -< enumerates all data vectors (all data vectors 
have finite position in the order), we can compute /'^' algorithmi- 
cally: Algorithm[T|processes data vectors sequentially in increasing 
-< order and computes /(v) when v is processed. 

These constraints have no solution when for some v, fo < /(v) 
and PR[iS'|v] — 0. Moreover, if fo > /(v), there is no nonnega- 
tive solution. When a solution /'^' is well defined, however, it is 
unbiased and Pareto optimal 



Require: ^ is an order on V 



> set of processed outcomes 
> set of processed data vectors 



5o ^ 
Vb ^ 
while Vo do 

V <— min^ {V \ Vo) > A minimum unprocessed vector 
fo ^ E[/(^)(S)|5o,v]pr[5o|v] > Contribution of 

preceding outcomes to the estimate of /(v) 

S' {S\v £ V*{S)} \ So > Unprocessed outcomes 

consistent with v 

if PR [5' I v] = Othen 

if /(v) / fo then return "failure" > No unbiased 
estimator 



else 



else 



VS G S',f''-'\S) 

/(v)-Jq 



PR[S' 

V5 G S',f^-^\S) 
Vb ^ Vo U {v} 
5o 5o U S' 



Lemma 3.1. When f^'^^ is well defined, it is unbiased and Pareto 
optimal. 

Proof. Pareto optimality: Consider an estimator /^^' such that 
for some v, / 7^ /'^^ on a set of outcomes V such that PR[I'|v] > 
0. Let V be ^-minimal with this property, and let So and S' be as 
in our constraints, with respect to v. From definition of (f), the set 
V (or a same-probability subset of it) must be contained in 5'. 

From ^-minimality of v and our assumptions for continuous 
spaces, we must have E[/'^' : \So] = E[/|5o] and hence /'^' : 
S' ^ f : S' . The value assigned by Z*-^' on the outcomes <S' 
is the unique choice which minimizes the variance of v subject to 
/'^' : 5o, in the sense that any estimator that differs on a positive 
probability subset of S' will have strictly higher variance. Hence, 
VAR[/|v] > VAR[/*^'jv] and thus / can not dominate /*^\ 

Unbiasedness follows from the choice of /'^' on the outcomes 
S' in ^ (line [13] of Algorithm [T}: £[/(-<)] = E[/(^) |5']pr[5'] + 
E[/(^)|5o]PR[5o] = /(v). □ 

Two vectors V ^ z are ^/ependenf with respect to ^ if PR[<^^^ (v)|z] 
0. Consider now a partial order -<' derived from -< by only retain- 
ing relations between dependent vectors. Then all linearizations of 
-<' have the same mapping of outcomes to determining vectors and 
thus, the resulting order-based estimators are identical. Conversely, 
when a partial order -<' has the property that for all outcomes 5", 
min^ V* {S) is unique, we can specify the estimator /^^^ ' with 
respect to it (same as using any linearization). 

Lemma 3.2. The estimator f'^^'' is monotone if and only if for 
any outcome S and v G V*{S), the estimate on outcomes deter- 
mined by V is at least /'^^ {S): 



f 



is monotone VS, Vv G V^S) /*^'(v) > f^\S) . 

Proof. An outcome S' with V{S') = {v} has V*{S') C 
V* (S) and is determined by v. From monotonicity, we must have 
/'^^ (v) > Z'^' (5*). Conversely, consider two outcomes 5* and S' 
such that V* (S) C V* {S'). Let v' be the determining vector of S' 
and V be the determining vector of S. We have that v G S', hence 
/H)(v) = /H)(5)>/M(5'). □ 
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Forcing nonnegativity f^'^^^: When the constraints specifying 
Z'^-* have no nonnegative solution, we can explicitly constrain the 
setting of /'"^•' : <S' to ensure that nonnegativity is not violated on 
successive vectors: 



Algorithm 2 



min PR[5|v](/(S)-/(v))^ 



ses' 



J2 MS\v].f(S) = /(v) - /o 



S6S' 



VvVv f{S)PR[S\V]<f{v') 
ses'uSo 



(7) 
(8) 
(9) 



We minimize variance ^ subject to unbiasedness ^ and not vio- 
lating nonnegativity to any v' ^ v The resulting estimator is 
Pareto optimal if the solution of the system is unique. 

A solution /'^^' satisfying nonnegativity constraints is identi- 
cal to /'^■' when the latter is defined and nonnegative. With /'^^' 
formulation, the constraints |9]( can make two vectors v and z de- 
pendent also when both sets of outcomes (j>^^{\) and (j>^^{z) have 
positive probability for some data vector y that succeeds both v 
and z. This is because when v precedes z, the constraints (|9j due 
to y are less tight. As with f^'^\ we can equivalently define /'^^^ 
with respect to a partial order ^' derived from -< by including only 
relations between dependent data vectors and all linearizations of 

Ordered partition The order-based formulations, however, 

in particular the more constrained /'^^', can preclude symmetric 
estimators. Symmetric estimators are naturally desirable when / 
is symmetric (invariant under permuting coordinates). When two 
symmetric vectors are dependent under -<, the member that ^- 
precedes the other can have a strictly lower variance. 

We therefore seek a more relaxed formulation that will allow us 
to balance the variance of symmetric vectors. 

We consider a setup where data vectors are partitioned into or- 
dered batches U — {Uo, Ui, . . .}. The estimator f^'^^ prioritizes 
earlier batches but "balances" the variance between vectors that are 
members of the same batch. That is, the estimator is locally Pareto 
optimal for each Ui. given f : So, unbiasedness |i| for all v G C/^ 
and nonnegativity ^ for all v' G Uyi. That is, under these con- 
straints, there is no other setting of / on S' with smaller or equal 
variance for all vectors in Ui, and a strictly smaller variance for at 
least one vector. The estimator is Pareto optimal if at each 
step h, when fixing the variance of all vectors in Uh, the solution 
is unique. Symmetry (invariance to permutation of entries) can be 
achieved by including all symmetric data vectors in the same part 
and using a symmetric locally optimal estimator. 

This is formulated in Algorithm |2] which processes Ui at step 
i, setting the estimator on all outcomes consistent with Ui and not 
consistent with any vector in Uj for j < i. 



4. POISSON: WEIGHT-OBLIVIOUS 

We now consider estimating /(v) when sampling of entries is 
weight-oblivious and Poisson: entry i G [r] is sampled indepen- 
dently with probability pi > 0. 

The outcome 5* C [r] includes the sampled entries, and for each 
sampled entry i £ S, the value Vi. 

The inverse- probability estimate f^"'^\S) = f{-v)/Uie\r] Pi^ 
when S = [r] (all entries are sampled), and /'^"^'(S) — other- 



Require: C/o, f/i, . . . is a partition of V 
1: 5o > set of processed outcomes 

2: for /i = 0, 1, 2, ... do > his the index of current part to 

process 

3: 5' ^ {S\Uh n V* (S) / 0} \ So > Unprocessed 

outcomes consistent with Uh 
4: Compute a locally optimal estimator for Uh, extending / 

on So, and satisfying 

Vv', Y /(5)PR[S|v'] < /(V) . 

ses'uSo 



5: 5o ^ 5o U S' 



wise, is defined for all / and from ([T]l has variance 



(10) 



VAR[/(«^)] = /(v)^ ( — 1 

^liig[r] P 



This estimator is the optimal inverse probability estimator for quan- 
tiles and range: The set of outcomes S* which contains all out- 
comes with 15*1 = r is the most inclusive set for which we can 



determine both the value /(v) and PR[5* | v] (see Section 2.2 1. The 

" (HT) " {HT) 

estimators RG (r = 2) and min are even (Pareto) opti- 
mal: this is because any nonnegative estimator must have f{S) = 
on outcomes v consistent with data vectors with /(v) = 0, which 
includes all outcomes with \S\ < r for these two functions. Con- 
sidering all estimators that assume positive values only when | S| = 
r, variance is minimized when using a fixed value. The estimator 
y(ifT)^ however, is not optimal for all other quantiles (t^ when 
^ < r) or for RG when r > 2. 

We present optimal estimators for max and Boolean OR: the 

monotone estimators max*-^^ and OR which prioritize dense 

data vectors and the estimators max'-'^' and OR which prioritize 
sparse vectors. 



4.1 Estimator max^^^ 

We compute the estimator f''^ ^ (Algorithm[TJ with respect to the 
following partial order -<: The data vector precedes all others, 
that is Vv G V, ^ V. Otherwise, -< corresponds to the numeric 
order on L(v) = |{j G [r] \ Vj < max.ifz[r] Vi}\ (the number 
of entries strictly lower than the maximum one): v ^ w 
L(v) < L{-w). 

For an outcome S, the set V* (S) includes all vectors that agree 
with the outcome on sampled entries: v' G V*{S) <=4> Vi G 

S, v'i = Vi. 

The determining vector (j}{S) of an outcome S is min^ V*{S): 
(j>{S) = Oif^i e S, Vi = (In particular if S = 0). If 5 ^ 0, 



if j £ S and 0(5) j — max^gs ^'i otherwise. The 



mapping (t>{S) is well defined by ^, which means that the estimator 
/'^' (if defined) is unique. Because -< is symmetric (invariant to 
permutation of entries), so is /'^'. 

Our choice of -< aims at obtaining a monotone estimator through 
conservative (low) estimate values: The determining vector of an 
outcome S has all unsampled entries set to the maximum value of 
a sampled entry (this value is also the lower bound f{V*{S'))). 
The optimal estimate value for this vector on S would be lower 
than if we had a determining vector with lower entries and same 
maximum because the outcome on such a vectors is more likely to 
have a lower maximum sampled entry, meaning a lower estimate 



5 



on such outcomes which needs to be compensated for by a higher 
estimate on S. 

For the minimum vector 0, there are no preceding outcomes 
(So = 0) and we can directly compute max'^' (0) (the estimate 
for all outcomes that have </i(S) — 0), obtaining max'^' = on 
all outcomes 5* such that Vi £ S,Vi — 0. 

We can now proceed and compute the estimator for all outcomes 
S with determining vector v such that L(v) — 0, that is, outcomes 
where at least one entry is sampled, has positive value, and all other 
sampled entries have the same value: Vigs, ~ maxigsUi > 
0. The probability of such an outcome given data vector v with 
I/(v) = is the probability that at least one entry is sampled: 
1 — riigfrl ^ P») estimate value is accordingly 



, (i) maxigs Vi 



1 -n,gH(i 



(11) 



Maximum over two instances (r = 2). We have max'^' = 
on outcomes consistent with data (0, 0) and from l |l 1[ ( max'^' — 
for outcomes consistent with data with two equal pos- 



P1+P2-P1P2 

itive entries (5 = {1}, S = {2}, or S = {1, 2} and vi ^ V2 = v). 
We now consider data vectors where i;2 < «i (other case tJi < V2 is 
symmetric). The estimate is already computed on outcomes where 
exactly one entry is sampled. These and the empty outcome are in 
So- The outcomes S' are those where both entries are sampled, and 
hence PR [5'] = piP2. To be unbiased, the estimate x must satisfy 
the linear equation (line[T3]of Algorithm[T|l: 



max{ 1)1,112} = P1P2X+ 



vi 



V2 



+Pl(l-P2) , I-P2(l-Pl)- , 

P1+P2 - PlP2 P1+P2- PlP2 



Solving and summarizing we obtain: 



Outcome S 



max(-^)(S) 



S = {2} 
S = {1,2} 



P1+P2-P1P2 

V2 

P1+P2-P1P2 

max(i'i ,^2) (l/p2 — + 1)^2 

P1P2 



P1+P2-P1P2 



Expressing the estimator as a function of the determining vector, 
assuming vi > V2 (other case is symmetric), we obtain: 



max'"''^ (v) = vi 



V2- 



1 - Pi 



Pl (Pl + P2 - PlP2 ) Pi (Pl + P2 - P1P2 ) 



(12) 



Lemma 4.1. The estimator max*^^-* is Pareto optimal, mono- 
tone, nonnegative, and dominates the estimatormkx^^^K 

Proof. Pareto optimality follows from the /'^' derivation. For 
monotonicity, we observe that determining vectors of more infor- 
mative outcomes (outcomes with more entries sampled) have an 
equal-or-larger maximum entry vi or an equal-or-smaller minimum 
entry V2, which clearly holds as the coefficient of vi in ^12\ is pos- 
itive and that of V2 is negative. Nonnegativity follows from mono- 
tonicity and the fact that the estimate is when S = I 



The estimator max'^^' assumes values or 



max(?Ji ,V2 ) 
P1P2 



and thus 



maximizes variance amongst all unbiased estimators with values in 
the same range. Hence, to establish dominance over max'^^', it 
suffices to show that on data v, max'-^-' (v) < EiSiiiii^il 

, . \ J — PIP2 

immediate from dl2b. □ 



which is 



Multiple instances: max^^' for r > 2. 

A sorting permutation of a vector v is a permutation tt of [r] 
such that Vtti > ■ ■ ■ > Utt,, . We use the notation tt (v) — , . ■ . , v-nr- 

We prove that the estimator max'^' applied to an outcome S can 
be expressed as a linear combination of the sorted entries of the de- 
termining vector 0(5'). The coefficients depend on an accordingly 
permuted probability vector. When there are multiple entry of equal 
value, the sorting permutation is not unique. We show, however, 
that the estimator is invariant to the particular sorting permutation 
used. 



Theorem 4.1. 



(13) 



wliere tt is the sorting permutation of (l>{S) and cti^q are rational 
expressions in qi, . . . , q,. that are always defined when qi G (0,1]. 
Moreover, the coefficients ' prefix sums 



An 



(14) 



are symmetric rational expressions for pi for i G [ft] and for pi for 
i G [r] \ \h\. 

Proof. We first show that the symmetry property of the pre- 
fix sums implies that the estimate does not depend on the choice 
of sorting permutation (when it is not unique). It suffices to show 



this for a sorted v such that va 



and show that when sym- 



metry holds, the estimator is the same for the identity permutation 
(1, . . . , r) and the permutation (1, . . . , j — 1, j + 1, j, j + 1, . . . , r) 
(exchanging positions j and j + 1). Both are sorting permutations 
of v). The argument can be applied repeatedly if there are more 
than two equal entries. 

Let V be sorted and let &i = Vi^\ — Vi for i =G [r — 1]. We can 
rewrite \13\ as 

r r— 1 

Oi^pVi = Ar,pVl — ^ SiAi^p 
i = l 1 = 1 

When 5j = 0, let p and p' respectively be the original and per- 
muted vectors with pj and Pj+i exchanged. By symmetry, Ai^p = 
A. I for i G [r] \ {j}. But 5j = 0, and hence the estimator is the 

i,p 

same with both permutations. 

We now show that the estimator has the form l |13[ l and that the 
prefix sums satisfy the symmetry property. For v with sorting per- 
mutation TT and I/(v) = k, we can rewrite ( |13[ l as 



^.n{py 



(15) 



«-i^r-ft,7r(p) + E (\-,t{p) - ^^-l,n{p)>-^ 



i = r — fc-f 1 

For all outcomes S consistent with v, L{(j){S)) < i(v) < k. 
Thus, the estimator for data v is fully specified by A^ 7r(p) ^h^^e 
h>r- L(v). 

We show by induction on fc > 0, that the estimator can be ex- 
pressed in this form for data vectors with L(v) < k. For the base 
case of the induction (k — 0), it suffices to specify the rational ex- 
pression Ar,p. By substituting a determining vector with all entries 
equal in |T3j and equating with l |l 1| (, we obtain 

^^'P = TT ^ n \ (16) 

i-n«gw(i-po 
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specifies max*-^' (v) for all determining vectors v such that L(v) = 
(all entries are equal and positive) and thus specifies the estima- 
tor correctly for all data vectors with I/(v) = 0. Symmetry clearly 
holds as 7r(p) independent of the particular permutation tt. 

In the induction step we assume that the rational expressions 
Ai^p are well defined and satisfy symmetry for alH > r — fc and all 
p, that is, \15\ is equal to max'^' when L((/!>(S')) < k and hence 
the estimator is specified for data with L{v) < k. We then specify 
Ar-k-i,p by relating it through a linear equation to higher prefix 
sums. This fully specifies the estimator for data with i(v) — k + 1. 
Symmetry properties of A^ 7r(p) (showing it is symmetric in 

{pi, . . . ,pr-k-i} and in {pr-k, ■ ■ ■ ,Pr}) follow from the sym- 
metry in the equation and assumed symmetry of the higher prefix 
sums. 

We now express Ar-k-i,p as a linear combination of prefix 
sums of the form A^ 7r'(p) ^^ere h > r — k and [r — fc] C 

Wi, ■ ■ • iK}- 

Consider a vector z such that L{z) — k + 1 and entries are sorted 
in nonincreasing order (the sorting permutation is the identity, and 
this is without loss of generality as we can permute p accordingly). 
We show that there is a (unique) value of Ar-k-i,p that results in 
an unbiased estimate for z. This value turns out to be independent 
of z (works for all vectors with I/(v) — k+1 and same permutation 
of sorted entries). When solved parametrically, this is a rational 
expression in pi, ... ,pr that satisfies the symmetry property. 

The vector z has zi = ■ ■ ■ = Zr~k~i = Z and Zr < ■ ■ ■ < 
Zr-k < Z. Consider the vector z' that is equal to z on all entries 
except that z'^_f. = Z. Clearly L{z') = k and therefore, by induc- 
tion, the estimate for z' is unbiased, that is, has expectation Z on 
data z'. 

We relate outcomes for different data vectors that correspond to 
the same sample a £ 2'*^' which is the set of sampled entries. The 
vectors z' and z have the same determining vectors, and thus, the 
same estimate on all samples where a,.-fe = (do not include the 
entry r — fc). Therefore, the estimate is unbiased on z, if and only 
if the expectation for data z is equal to the expectation for data z' 
over samples where dr-k = 1 (entry r — fc is sampled). 

We consider the difference in the contribution to the estimate of 
a sample a that includes r — fc on the data vectors z and z'. 

If none of the entries [r — fc — 1] is sampled, the determining 
vectors differ on the first h > r — k entries (where h is equal to 
r — k plus the number of unsampled entries in [r — k + l,r]). The 
value of the determining vector on the first h entries is Z when the 
data is z and Z' when the data is z'. There is a sorting permutation 
tt' for both determining vectors which depends only on the sample 
(T (works for all choices of z and respective z': it has all unsam- 
pled entries in sorted order, followed by entry r — k, and then by 
other sampled entries in sorted order. Thus, the difference in the 
contribution to the estimate is A^ 7r'(p) ~ 

If at least one of the entries [r — — 1] is sampled, then the 
determining vectors are identical on the first h ~ 1 entries (value 
is Z), differ on entry h (the value is Z when data is z and Z' 
when z') and identical on remaining entries (values smaller than 
Z'). Again, there is a common sorting permutation tt' for the 
determining vector of all choices of z and of z': it contains the 
first r — k — 1 entries and unsampled entries in [r — + 1, r], 
all in sorted order, followed by r — fc, and then sampled entries 
in [r — + 1, r] in sorted order (note that it is the same permuta- 
tion we used for the case where none of the entries [r — k ~ 1] are 
sampled). Thus, the difference in the contribution to the estimate 

)(Z-Z'). The 



r'(p) 



only samples for which h = r — kis when all entries in [r — k, r] 
are sampled. In this case the determining vectors are the respective 
data vectors and the sorting permutation of the determining vec- 
tor is the identity. Thus the only "unknown" is Ar-k-i,p and it 
appears, when replacing Qr-fc,p ~ Ar-k,p — Ar-k-i,p. 

Recall that for the estimate for z to be unbiased, the expectation 
of these differences over samples must be 0. The expectation is the 
sum over samples cr of the probability of the sample 



PR 



multiplied by the difference. By equating with we obtain a linear 
equation with one variable Ar-k~i.p, which must have a unique 
solution. Since all terms are multiplied by {Z ~ Z'), it factors out. 
The equation and solution Ar-k-i,p are independent of z. There- 
fore, the estimate is unbiased for all data vectors z with L{z) = 
k + 1. 



□ 



We now write the equations explicitly, using the notation 

tt"^ = (1,. . . ,r - fc - = r - fc + 1, . . . ,r|(j, = 0}, 
r — k, {i = r — k + 1, . . . , r\(Ji — 1}) 

r 

= r ~ k + y ^ (7i 

i — r — fc + l 

for the sorting permutation and h value used with the sample cr. 

r — k — 1 



'{P) 



where / is the indicator function. 

We can express the equation in terms of the projection a of the 
sample cr on entries K = [r — k + l,r\. We combine terms with 

identical projection while noting that tt*^ = -k" and h'^ = 
depend only on the projection. We eliminate common terms and 
obtain: 



= p-^i^H n 



(1- n 

i— 1 

^^^^^ 



(17) 



(TG2 



K 



r — k—1 

i—1 



For fc = 0, A' = and thus there is only one term. The equation 
relates Ar-i,p and Ar^p, yielding 



Ar-i: 



Ar.p 



(18) 



(i-nirai-p.)) 

For k = 1, K = {r} and hence there are two terms, according 
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to the value of ar 



= {l~Pr)\^Ar.p- 
r-2 

(1 ^ 11(1 -pO)-4r-l,(pi,...,p,„2,Pr,Pr-l) 



Therefore, 



2,p 



Ar-l,p + \_j^ p' - Ar,p 

^-\[Zl(i-Pi) 



where p' = (pi, . . . ,pr-2,Pr,Pr-i). 

We conjecture that max'^' is monotone, nonnegative, and dom- 
inates max'^"^' . We verified these properties for r < 4 with uni- 
form p, using the following lemma and explicit computation of the 
coefficients. 

Lemma 4.2. To establish monotonicity, nonnegativity, and dom- 



inance o/miix^ ' over mk>c ' it suffices to show that at < O for 
i > 1 and that ai < 1/ HigH P»- 

Proof. To establish monotonicity, consider two types of ma- 
nipulations of a determining vector: increasing some of its maxi- 
mum entries or decreasing a maximum entry in case the maximum 
entry is not unique. Now, for any data v and outcomes Si C S2 
(S2 contains all entries sampled in Si and more), the determining 
vector of S2 can be obtained from that of S\ using such operations. 
For monotonicity, we need to show that the estimate value obtained 
for V on outcome S2 is at least that of Si, equivalently, that these 
manipulations can only increase max*^^'. For the second manip- 
ulation, it suffices to show that ai < for i > 1. For the first 
manipulation, it suffices to show that X^^i^^i > for all * > 1- 
Since we know that X]j6[,] '^i > 0' this is implied by < for 
i > 1. Nonnegativity follows from monotonicity and the base case 
of estimate value when there are no sampled entries. 

To establish dominance over max'^"^', given monotonicity, it 
suffices to show that ai < 1/ riig[ri P^- T^^^ means that all max'^' 



estimates on a given data vector v are at most 



the max 



(HT) 



which is 



estimate. The HT estimate has maximum variance 



amongst all unbiased estimators that assume values in the range 



0, 



max(v) 



n, 



eW Pi 



Hence, VAR[max^^'] < VAR[max'^^']. □ 



These expression can be use to compute the estimator, but the 
number of different prefix-sums grows exponentially with the num- 
ber of distinct probabilities in the k suffix of p. We give specific 
consideration to uniform probabilities. 

Uniform p. When p = pi — P2 ~ ■ ■ ■ = Pr, we can use 
cti^p = a,;_p and Ai_p = Ai^p for the coefficients in l |13[ > and their 
respective prefix sums. For a given p, we only need r different val- 
ues, Ai^p for i G [r], to specify the estimator. We omit p from the 
subscript for brevity. We show that for a fixed p, the estimator can 
be computed in time quadratic in the dimension. 

Theorem 4.2. The estimator, for a given p, can be computed 



in 0{r ) time using the relation 



(19) 
(20) 



1 -p 



- (1 - (1 -p)'' 



)A, — k + i-i 



1 - (1 -p)'-''- 



Proof. Using uniform p, l |16[ ) simplifies to ^19\ . The equation 
l |17| l simplifies to 



- E(tV'"-^(i 



Ar-k+e — (1 — (1 — pY ^ ^)Ar~k + l-\ 



(21) 



We obtain \2Q) by expressing A, — as a function of Ah for h > 
r — k. This relation is a triangular system of linear equations and 
allows us to compute the estimator (the coefficients on^p for i £ [r] 
for a given p) in time O(r^). □ 

We compute the parametric form of the higher prefix sums: 



Ar-2 — Ar-\ 

For r = 2, we obtain 

A2 = 

Ai = 



l-(l-p)'- 
1 



l-(l-p)'-i 

i + ji-py-^ 

1 - (1 -p)'-2 



p(2-p) 
1 

p2(2-p) 



Using a2 = A2 ~ A\ and ai = Ai, we obtain the estimator 

1 1 -p 

p2(2-p)'"p2(2-p) 



(22) 



For r = 3, we obtain 

^3 = 
A2 = 
Ai = 



p(p2 — 3p + 3) 
1 



p2(p2 -3p + 3)(2-p) 

2 + p^ - 2p 
p3(p2 -3p + 3)(2-p) 



Using a-j, — A2. — A2, cx2 = A2 — A-i, and ai = A^, the 
estimator is 



2-2p + p^ 



p3(2-p)(3-3p + p2)' 

1-p (l-p)2 



p3(3-3p + p2)' p2(2-p)(3-3p + p2) 

Algorithm|3]includes pseudo-code for the computation of the co- 
efficients and for the application of the estimator max'^^ for uni- 
form p and any r > 1. 
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Algorithm 3 max'^' uniform p 



1 
2 
3 

4: 



function CoEFF(r,p) 



1 



i-(i-p) 

k 



> compute coefficients of estimator 
t> prefix sums 

Ar-k+t — (1 — (1 — 



. . , r - 2 do 



5: 

6 
7 



Ql A\ 

for /i = 2, . . . , r do 

— 

return a 



t> compute coefficients 



such outcome given data (ui, 0) is pi. To minimize variance, we 
would like to set the estimate to vi /pi on these outcomes, which 
we can do because this setting does not violate nonnegativity ([9} for 
other vectors. We next process vectors of the form (0, V2)- They 
are determining vectors for outcomes S[ with both entries sampled 
and values are (0, V2) and outcomes 52 with only the second en- 
try sampled and value is V2- The outcomes 5j are not consistent 
with any other data, and are not constrained by (|9j. The outcomes 
S2 are also consistent with data vectors with two positive entries 
{v\ , V2) and therefore we need to ensure that we do not violate ^ 
for these vectors. To minimize the variance on (0,112), we seek 
max(5i) > max(52) with rnax(52) being as large as possible 
without violating (|9]l. Lastly, we process vectors with two posi- 
tive entries. The outcomes determined by these vectors have both 
entries sampled and are not consistent with any other data vector. 
Summarizing, we obtain the estimator 



Outcome S 



10: 
11 
12 



13 
14: 

15 
16 
17 



function EST(S, a) t> Estimator applied to outcome S 

if S = then return 

z ^ SORTDEC{ui|i e S} > multiset of values of 

sampled entries is sorted in non-increasing order 

t> Compute sorted determining vector u 



for i — 1 , . . 
for i — 1 , . . . , r - 

Ui <— Zl 

return Yl^i=i '^i''^ 



\S\ do 

- Zi 

S\ do 



5 = 
5 = {1} 
S = {2} 

S = {1,2} 





PI 



max{l— pi,p2} 

/ \ P9 (l~Pl) /I \ 

max(Di,D2)- niax{l-pi.p2} ''^^""^ 



This estimator is Pareto optimal but is asymmetric: the estimate 
changes if the entries of v (and p) are permuted. 

To obtain a symmetric estimator, we apply Algorithm[2]process- 
ing Ui and U2 in batches, searching for a symmetric locally optimal 
estimator for Ui and then for f/2. We obtain: 



4.2 Estimator max^^^ 

We now seek an estimator which prioritizes "sparse" vectors, 
which is captured by order-optimal where vectors with fewer pos- 
itive entries precede others. Formally, we use an ordered partition 
according to -L(v) = \{j G [T\\vj > 0}|, where part Uh includes 
all vectors with L{y) = h. We derive estimators for r = 2, while 
demonstrating usage of the different constructions. 

The minimum vectors are Uo = 0. An outcome S is consistent 
with if and only if \/i £ S,Vi = Q and we set max*'^' {S) 0. 
This setting must be the same for all nonnegative unbiased estima- 
tors. 

We first attempt to apply Algorithm [T] The determining vec- 
tor 4'{S) is uniquely defined by the partial order -< and obtained 
by substituting for all unsampled entries i ^ S. This, the esti- 
mator is invariant to a choice of a total order linearizing ^. Pro- 
cessing Ui, we obtain the estimate max''^^(5') — Vi/pi on all 
outcomes with one positive entry Vi > amongst i G 5*. It re- 
mains to process vectors 1/2- The outcomes 5' have S — {1, 2} 
with vi,V2 > and hence a determining vector with two pos- 
itive entries. The estimate is the solution of the linear equation 
piP2max(C^)(S)+pi(l-p2)^+P2(l-Pi)^ = max(K,t,2)). The 



Outcome S 



(5) 



solution. 



however, may be nega- 



' PI ' ^ ' P2 

max(T;i,D2) — (1— Pl)^2 — (1— P2)^l 
P1P2 

live (e.g., when vi — V2 andpi + P2 < 1). 

To obtain a nonnegative ^-optimal estimator, we must enforce 
the nonnegativity constraints |9| when processing Ui. Now the re- 
sult is sensitive to the particular order of processing vectors in Ui: 
Suppose vectors of the form {vi , 0) are processed before vectors 
of the form (0,«2). The vector (11,0) is the determining vector 
of all outcomes with the first entry sampled. That is, all outcomes 
with both entry sampled and values are {vi , 112) and outcomes with 
only the first entry sampled and has value vi. The probability of 



S = 9 
S^{1} 
S = {2} 

S = {1,2} 



PI (l + max{0,l — PI — P2}) 

12 

P2(l + max{0,l— PI— P2}) 

max(t,i,^2)- ';i''~7n''"i"''^''~''\' 
^ ^' l+max{0,l-pi-p2> 



We can see that max'-'^' dominates max^^"^' - this follows from 
max''^' < max(v)/(pip2) for data {vi,V2). 

Example. Figure[T|illustrates the relation between max'^' , max'^^ , 
and max*^^"^' and their variance when data vectors have the form 
V = (ui, U2) and each entry is sampled independently with proba- 
bility 1 /2. The plot shows the ratios Z^3"f'''^MX and 



VAR[max("r)] VAR[max(«T)] 

as a function of min(i;i ,12)/ max(iii , W2). We can see that max'-^'^' 
is dominated by max'^' and max''^' and that the two Pareto op- 
timal estimators max^^-* and max''^' are incomparable: on in- 
puts where one of the values is 0, VARfmax''^-'] — | max(v)^ 
whereas VAR[max'-^^] — (11/9) max(v)^. On inputs where vi = 
V2, VAR[max'^'] = (l/3)max(v)^ whereas VAR[max'^'] = 
I max(v)^. 

4.3 Boolean OR 

We now consider OR(v) = vi y V2 \/ ■ ■ ■ \/ Vr over the domain 
V — {0, 1}"^. The best inverse probability estimator is Or' ' 



(HT) 







1/nLiP' when \S\ = r and \/i^gVi = 1 and OR 
otherwise. By specializing max'^'' and max'^', we obtain the 

' (L) - (U) 

estimators OR and OR , which turn out to be optimal also 

in this more restricted domain. Optimality of OR follows from 
order optimality with respect to -< satisfying: Vv G V\{0},0 -< v 



9 



Sample distribution: 





1 e s 


1 ^ 5 




1/4 


1/4 


2 e 5 


1/4 


1/4 



VAR max^ 



— — max(i'i ,V2) "f" g min(-(;i ,1)2) — — max(i)i , V2) min(Di , V2) 
< — max(t'i , V2) 





1 e 5 


1 ^ 5 










2 e 5 


4 max(Di, D2) 






k;('(;i,D2) +2 m.in(t'i ,1)2) — 2 max(t'i , V2 ) min(vi , V2 ) 



< — max(ui , V2) 



max'-'") 


1 6 S 


1 ? 


5 




2 ^ S 


4:Vl 

3 







2 e 5 


S max(i;i ,iJ2)~4min(?;i,T;2) 






3 


3 




max(t^) 


1 6 S 


1 ^ 




2^5 


2vi 





2 e 5 


2 max(i)i , t'2) — 2 min(i)i , V2) 


2V2 



VAR[max'-"^'] = 3 max(iii, 1)2)^ 




0.4 0.6 

min/max 



Figure 1: Estimators for max{t;i , V2 } over Poisson samples (weight-oblivious) with pi = P2 = 1 /2. 



and for v, v' 7^ 0, 



V ^ V 



L{v) < L(v') 



where I/(v) = Ifiliii = 0}| is the number of zero entries in v. 
The determining vector 0(5') is obtained by setting, for i ^ S, 
Vi <— Vjgs^i- For r — 2, the estimator as a function of the 
determining vector is 



(5" = {l})and 



P1(P1+P2-P1P2) 



when S = {1,2}. Therefore, 



VARfOR' '1(1,0)] 

(1 -Pi) +pi(l -P2)( 

1 

+P1P2 ( 



Pi + P2 — PlP2 
1 \2 



1)^ 



Pl(Pl +P2 — P1P2) 



OR (vi,V2) = ^ 



1)d2 



P1P2 



Pi +P2 -PlP2 



Optimality of OR ^ follows by noticing that when specializing 
the construction of mkx^'^\ the construction remains optimal with 
respect to an ordered partition according to r — i(v). 



and or' 



Figure |2] shows the variance of the estimators OR , OR 



as a function of p 



Pi 



P2. The estimators OR 



and Or'*^^ dominate Or'^^' . The estimator Or'^' has minimum 



variance on (1, 1) and OR is the symmetric estimator with min- 
imum variance on (1, 0) and (0, 1) (over all nonnegative unbiased 
estimators). 



Variance. To gain a better understanding of the relative perfor- 
mance of the estimators Or' Or' \ and Or' we study 
their variance. For data 0, all estimates are 0, and thus all three es- 
timators have zero variance. On all data v with 0R(v) — 1, using 



VAR[OR*'^^'iOR(v) = 1] = ^ 



- 1 



Pi 



(23) 



The variance of Or'^' and Or'^\ has more fine dependence on 

the data vector: The estimate OR on data vector (1, 1) is 1/p 
with probability p = pi + P2 — P1P2 and otherwise and hence, 
using l[TJ: 



VARfOR^^^'Kl,!)] = 



Pi + P2 — P1P2 



- 1 



(24) 



The estimate for data vector (1,0) is with probability 1 — pi 



(entry 1 is not sampled). 



31+P2-P1P2 



with probability pi(l — P2) 



(0 

> 



1000 
100 
10 

1 

0.1 

0.01 
0.001 



0.1 





HI on (1,0), (1,1) — 






Lon (1,1) 






■-.......^ Lon (1,0) 






Uon(1,1) 






^--.^ on (1,0) 








\- 
\. 






\ 



P=p1=p2 



Figure 2: Variance of O'r'^^', 0"R*^' and O'r'^' when pi 



P2 



- p, on data vectors (1,1) and (1, 0), as a function of p. 
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Asymptotically, when p — >■ 0, for data vectors (1, 0), (0, 1), and 
(1, 1) we get that VAR[0"r'^^'] « 1/p^. On the other hand, for 
the data vectors (1,0) and (0, 1) we have that VAR[0R*^'], VAR[0"r'^'] 
1 / (4p^ ) and for the data vector (1,1), VAR [OR^^' ] . VAR [Or'^^ ] ^ 
l/(2p). 

This means that for data (1,1) ("no change"), the variance is half 
the square root of the variance of Or' ' . For data (1,0) or (0, 1) 
("change") variance is 1/4 of the variance of OR 

5. POISSON: WEIGHTED, KNOWN SEEDS 

We turn our attention to weighted Poisson sampling with known 
seeds, starting with estimating OR over binary domains and then 
consider max over the nonnegative reals. 

For the purpose of deriving estimators over binary domains (V = 
{0, l}*^), Poisson weighted sampling with known seeds is equiva- 
lent to Poisson weight-oblivious sampling (Section |4j. This rela- 
tion holds only for binary domains and is established through a 
1-1 mapping between outcomes in terms of the information we can 
glean from the outcome. 

The sample distribution of weighted sampling over binary do- 
mains is as follows: there is a seed vector u G [0, l]"^ where Ui 
are independent and selected uniformly at random from the inter- 
val [0, 1]. Defining p G [0, 1]'' such that pi = PRfr^ < 1], 



- (u) 

Estimator OR . 



i G S 



1 A Ui < Pi 



Pi is the probability that the ith entry is sampled if = 1. The 
entry is never sampled if «i = but since we know u, if Ui < pi 
and i ^ S we, know that Vi — 0. 

We now map an outcome 5* of weighted sampling with known 
seeds to outcome S' of weight-oblivious sampling with vector p 



i G S 

i ^ S and Ui < pi 
i ^ S and Ui > pi 



i £ S' and Vi = 1 
i G 5" and = 



It is easy to see that PR[5'] = PR[S"] and that V*(S) = V{S'). 

Observe that the weighted sample 5* is smaller than the corre- 
sponding weight-oblivious one 5*' since entries with values are 
not represented in the sample. Knowledge of seeds, however, com- 
pensates for this. We use knowledge of the seeds in a more elabo- 
rate way in the (significantly more involved) derivations of estima- 
tors for max(v). 



5.1 Boolean OR 

We state the estimators OR , OR 



and or' 'by map- 



ping the respective estimators obtained in the weight-oblivious set- 
ting (Section [4.3[ ). 

The optimal inverse-probability estimator uses the set of out- 
comes 5* such that Vi G [r] ,Ui < Pi- This corresponds to 5 = [r] 
in the weight-oblivious setting. If Vi G [r],Ui < pi and OR(v) 

l,oV^' = l/aewP»' 



Otherwise, OR^"^' = 0. 



Estimator OR 



Outcome S 



"OR^ 



S = 

(5*= {1} AU2 >P2)V 

{S = {2} A til > pi)V 
S = {1,2} 

S = {1} A U2 < P2 

S = {2} A ui < pi 



P1+P2-J)1P2 

P1(P1+P2-P1P2) 
1 

P2(P1+P2-P1P2) 



Outcome S 



OR 



WT 



PI (l+max{0,l 




-P2}) 


P2 (l+max{0,l 


-PI 


-P2}) 


«l(l-P2)+i 


2(1- 


-Pi) 


l+max{0,l- 


-Pl - 


P2> 



5 = 

5 = {1} A U2 > P2 

5 = {2} A Ml >pi 
Else 

The variance of the estimators is the same as in the weight obliv- 
ious case (see Section [43] and Figure |2j. In Section [STT] we show 
how our OR estimators can be applied to estimate distinct element 
count (union of sets), which are the sum aggregates of OR. 

5.2 Maximum over nonnegative reals 

We study estimating max under Poisson PPS weighted sam- 
pling. The seed vector u G [0, 1]*^ has entries drawn independently 
and uniformly from [0, 1]. r* is a fixed vector and an entry i is in- 
cluded in S iff Vi > UiT*, that is, with probability min{l, Vi/r*}. 

Recall that both r* and the seed vector u are available to the 
estimator. Therefore, when i ^ 5, we know that Vi < UiT* . 



Estimator max^^'^^ l|l7l|lil 

Consider the set of outcomes S* such that 

Ses* ^ 



max UiT* < maxui 
i^s ies 



This set includes all outcomes S from which max(v) can be deter- 
mined: For S G 5*, max(v) = max^gs Vi. For any data vector 
V, the probability that the outcome is in 5* 



PR[5* 



n 

i6[r] 



min{ 1 , max «i /r * } , 

i£3 



can be computed from the outcome, for any outcome in 5*. The 
inverse-probability estimator is therefore: 

max(^^)(S) = 



if max UiTi < maxvi 



otherwise 



max Vi 



n 

ie[r] 





mm < l,maxui/r' 



This is the optimal inverse-probability estimator since 5* is the 
most inclusive set possible. 

Estimator max^'^-' 

We use the partial order -< with preceding all other vectors, and 
otherwise the order corresponds to an increasing lexicographic or- 
der on the lists L(v) that is the sorted multiset of differences {max(v) 
Wi I i G [r]}. 

max'^' is ordered-based with respect to -< and is defined through 
Algorithm[T| For an outcome S, the set V* (S) of consistent vectors 
contains all vectors with Vi as in S for i G S and Vi < UiT* oth- 
erwise. The minimum consistent data vector min^ Vi^S) is well 
defined and thus (j){S) — min^ V* (S) is when S = and other- 
wise has 4i{S)i — Vi for i G S and 4>{S)i — min{maxjgs Vj , UiT*} 
for i ^ S. Note that when S ^ ill, all entries of (t>(S) are positive. 

The estimator max'^' for r = 2 is presented in Figure [s] using 
two tables. The first table shows a mapping of outcomes to deter- 
mining vectors, the second states the estimator as a function of the 
determining vector. The derivation is in Appendix [A| Monotonic- 
ity, nonnegativity, and bounded variance can be easily verified for 
r = 2 and are conjectured for r > 2. 
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outcome S 



determining vector <j}{S) 









HS)2 


s = 











s = 


{1} 




min{M2r2 


s = 


{2} 


min{ttiri , «2} 


W2 


s = 


{1,2} 




«2 



V = (ni, 112), Vl > 112 



v = (0,0) : 

Ul > 112 > T2* : V2 + 

Vl > Ti, V2 < min{r2*,wi} : vi 

V2 <vi < min{ri,r2*} : :;^r^ 

V2 < T2 < Wl < n* : + T2 - 



min{l,^} 



+ 



•^1^2 ("^1 -"1) 



In 



(•^1 +'^2 -"2)"! 
■"2{Tf +t|-i;i) 



+ 



^* _ "rrTl I (•^I'^l )('^r -"1) Jjj I (^1 +^2 -"2)Ti \ _|_ t*{t^-vi)(t*-v 
2 "l(Tj*+Tj) \t*(t*+t*-vi) I (Ti"+T|-i;2)«'l 



(tll-U2)T]'T|(Tj'-Dl) 

i'l(T*+T|-U2)(Tj*+T|-i;i) 

"2) 



Figure 3: Estimator max'^' for r = 2. The top table maps each outcome S to the determining vector (f>{S). The bottom table presents 
the estimator as a function of the determining vector v when vi > V2 (symmetric expressions for the case 112 > 1^1 are omitted). 



var[HTl/(tau*)"2, max/tau" = 0.5 - 

var[Ll/(tau')'^2, max/tau' = 0.5 - 



min/max 

(A) 



■ var[HT]/(tau')''2, max/tau' = 0.01 

- var[L]''(lau')'^2, max/tau" - "' '"'"i 



0.4 0.6 
min/max 



(B) 



max/tau" 


= 1 


max/tau' 


- 0.99 


max/tau* 


= 0.5 


max/tau' 


- 0.1 


max/tau' 


-0.001 









0.2 0.4 0.6 
min/max 



(C) 



Figure 4: Estimators max'^^ and max'^^' for two independent pps samples with ti = r J — t*. (A) and (B) show the normalized 
variance VAR[max]/(r*)^ for p — max(?;i, v2)/t* G {0.01, 0.5}, as a function of min(wi, «2)/ max(wi, «2). (C) shows the variance 
ratio VAR[max*^^']/vAR[max'^'] as a function of min(?;i, u2)/max(?;i, U2) for different values of p. 



Variance. Figure kl illustrates the relation between VAR[max'^'] 
and VARfmax'^-^'] when Tj* = r| = r*. The estimator max*^^' 
dominates max^ K We show the variance (divided by (r*)^ ) 
as a function of the ratio min(v)/ max(v). When max(v) > r* 



or V = 0, VAR[max 



(HT) 



V = VAR max 



v] — 0, and 



these are the only cases where there is no advantage to max'^' 
over max'^"^' . For all other data vectors, 



VAR max 



(HT) 



> 2 , 



where p — max(v)/r*. That is the variance ratio is at least 2 and 
asymptotically 0{l/p) when p is small. 

Fixing p, the inverse-probability weight estimator is positive with 

u U-1-, /max(v)-,2 2 tt VARImax'^^' Ivl 

probability p = ( — f^^^^j — P ■ Hence, ^ (t*)^ ~ 

p^{l/p—l) — 1 — p^ and is independent of min(v). The variance 
of the max'^' estimator decreases with min(v). For a fixed p, it 
is minimized when min(v) — max(v) and is maximized when 
min(v) = (v = (pr*,0) or v = (0,/9r*)). For the vector 
V = (0, pr*) the max'^' estimator equals r* = pr* / p with prob- 
ability p and otherwise so 



VAR max 



(^'|(pr*,0)] 



(t*)2 



= p'(l/p-l)=p- 



The variance ratio is accordingly at least 



VAR max 



(H-T)i 



1 + P 



VAR max 



(i-)l 



The variance ratio VAR[max'^^']/VAR[rnax'-^^] is larger when 
entry valu es are closer and with higher sampling rates (larger r*). 
In Section 8.2 we apply max'^' to estimate the max dominance 



norm, which is the sum aggregate of max. 

Exponentiated Range: There is no inverse-probability weight es- 
timate for RGtj (d > 0), because on data vectors with inin(v) = 
there is probability of determining RG(v) from the outcome. We 
derive order-based optimal estimators for RGd (d > 0) in [16]. 

6. POISSON: WEIGHTED, UNKNOWN SEEDS 

We show that when seeds are not available to the estimator, it is 
not possible to obtain a nonnegative unbiased estimator for £"^(v) 
where £ < r and for RG^ (d > 0) with weighted Poisson sam- 
pling. This impossibility results also holds for Boolean values and 
estimating OR and XOR of 2 or more bits. 

This result is related to a negative result by Charikar et. al (7) for 
estimating distinct counts, which is the sum aggregate of the OR 
primitive. They showed that most of the data set needs to be sam- 
pled in order to obtain a constant error in constant probability on 
the distinct count. Their model essentially corresponds to sampling 
with unknown seeds. 
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This result completes our understanding of when nonnegative 
unbiased quantile estimators over Poisson samples exist: Inverse- 
probability weight estimators exist when sampling is weight-oblivious 
(Section|4j, when weighted and seeds are known ([ 17 , 1 8 1 and Sec- 
tion[5j and when weighted with unknown seeds for estimating rnin 
(£ = r) (we obtain inverse-probability weights with respect to 5* 
that includes all outcomes with 5* = [r]). 

Theorem 6.1. For any £ < r, there is no unbiased nonnega- 
tive estimator for ^"^(v) over independent weighted samples with 
unknown seeds. 

Proof. Recall that with weighted sampling, an entry where 
Vi = is never sampled. As seeds are not available, we do not 
have any information from the outcome on values of entries that 
are not sampled. Therefore, the set V*{S) of data vectors consis- 
tent with S includes all vectors in V that agree with S on sampled 
entries. 

We first establish the claim for r = 2. Since our arguments use 
values restricted to {0, 1}, they also hold for OR(i;i, V2). Let pi 
be the inclusion probability of entry i when Vi — 1. We show that 
when pi+p2 < 1, there is no unbiased estimator that is simultane- 
ously correct for the four data vectors (1, 1), (1, 0), (0, 1), (0, 0). 

On outcome 5* = 0, we must have 0R(5') = to ensure non- 
negative estimates on data (0,0). When S — {i} (w; = 1) the 
estimator must have expected value 1/pi in order to be unbiased 
for (1,0) or (0, 1). When the data is (1, 1), the contribution to 
the expectation from outcomes with exactly one sampled entry is 
P2)/pi +P2(1— Pi)/p2 = 2 — pi —p2 > 1. In order to be 
unbiased, the estimator must have negative expectation on outcome 
S = {1,2}, which contradicts nonnegativity. 

Lastly, we extend the argument for ^*''(v) and general r. We 
consider the four data vectors where us = • ■ ■ = Uf+i = 1, 11^+2 ~ 
■ ■ ■ = Vr = 0, and (ui, U2) G {0, 1}^. Let pi > be the sampling 
probability of entry i when — 1 and assume that pi+p2 < 1. On 
these vectors, ^*'^(v) = OR(i'i, ^2). If neither 1 or 2 are sampled, 
we have £ — 1 positive sampled entries and the estimate must be 0. 
On outcomes with exactly one i £ {1,2} sampled, the expectation 
of the estimator must be — to be unbiased for data vectors 

Pi 11^ = 3 Ph 

(wi,«2) = (1,0), (0,1). The contribution of the estimator from 
these outcomes for data with ui = 112 = 1 is > 1, a 

contradiction. □ 

The argument for RGd {d > 0) is simpler. Consider estimating 
XOR of two bits with possible data (0, 0), (1, 1), and (1, 0). The 
estimate value must be zero on outcomes with only one sampled 
entry. This is needed to guarantee nonnegativity for data vectors 
where the other unseen entry is equal to the sampled one. Consider 
now data (1,0). The two possible outcomes are that only the first 
entry is sampled or that neither entry is sampled with zero estimate 
value in both cases. Thus, the expectation of the estimator is 
whereas RGti(l, 0) = 1. A contradiction to unbiasedness. 

7. ESTIMATING SUM AGGREGATES 

When data is aggressively sampled, our basic estimators for indi- 
vidual quantile or range query have high variance. When the query 
is an aggregate - the sum of many basic queries, we can estimate 
it through the sum of the respective basic estimators. Since our es- 
timators are unbiased, when estimates are independent, variance is 
additive and the relative error decreases with aggregation. 

The data is modeled as a set / of instances, where each instance 
j G / is an assignment of values (weights) to a set of keys K. For 
a key h, v{h) is the vector containing the values of h in different 



instances. That is, entry i of this vector, Vi{h), is the value of key 
h £ K in instance i £ I. Figure [5] (A) shows a data set with 3 
instances / = {1, 2, 3} and 6 keys A' = {1, . . . , 6}. 

Sum aggregates have the form X^hgA'' fi'^W)' where K' C K 
are selected keys. The primitives (functions /) include quantiles 
(max, min, largest entry) and exponentiated range RGd = 
(max(v) — min(v))'' and are applied to values of a single-key 
across multiple instances. The sum aggregates for max, min, and 
RG over two instances are known as the max-dominance norm, min- 
dominance norm \ 19 20], and Li distance. The L2 distance is the 
square-root of a sum-aggregate of RG2. When values are binary, 
each instance can be viewed as a set, and the sum aggregate for OR 
is the number of distinct keys (or the size of the union). 

For the example data set in Figure [5|A), the max dominance 
norm over even keys {K' — {2,4,6}) and instances {1,2} is 
10 + 20 + 10 = 40. The Li distance between instances {2, 3} 
over keys A" = {1, 2, 3} is 10 + 5 + 3 = 18. 

Applications 

Primary data sources structured as instances of values assigned 
to keys are snapshots of a changing data such as terms and their 
frequencies or sensor locations and measurement values and re- 
quest logs recording activity (values) for different resources (keys): 
number of requests to each URL in Web traffic logs and bytes sent 
to each destination IP addresses in network traffic logs. 

We classify queries as single-instance, multi-instance, or decom- 
posable. Single-instance queries are over data from a single in- 
stance and decomposable queries can be stated as a nonnegative 
sum of single-instance queries, and can be estimated using a corre- 
sponding sum of single-instance estimators. Multi-instance queries 
are those that involve multiple instances and can not be decom- 
posed, and are the ones targeted in our work. A single-instance 
query example on daily request logs is "total number of requests to 
.gov URLs on Monday." A decomposable query example is "total 
number of requests to .gov URLs in the past week," which can be 
posed as the sum of single-instance queries. Multi-instance queries 
include difference norms and distinct counts across days 

We aim to find an optimal estimator when the query and under- 
lying sampling scheme are given. The choice of sampling scheme 
is according to efficiency of processing on the data source and ef- 
fectiveness on the queries of interest. Since the same sample might 
be used for different classes of queries, it is not necessarily opti- 
mized for a particular one. We review sampling methods, starting 
with single-instance, and the joint distributions (coordinated or in- 
dependent) for multiple instances, and then show how to estimate 
sum aggregates using single-key estimators. 

7.1 Sampling a single instance 

We review popular summarization methods of a single instance: 
Poisson, bottom-fc, and VarOpt sampling. 

Sampling can be weighted or weight-oblivious. If sampling is 
weighted then the probability of including each key in the sample 
depends on its value v(h). When sampling is weight-oblivious then 
inclusion probability does not depend on value. 

The distinction between weighted and weight-oblivious sampling 
is important, even when values are binary, when sampling sparse 
data sets involving multiple keys. When most keys have zero val- 
ues and only positive values are explicitly represented in the data, 
a weighted sampling algorithm needs to process only these keys 
and generates a sample containing only such keys whereas weight- 
oblivious sampling is applied to the full domain of keys (example is 
the set of active destination IP addresses at a given gateway, which 
is a small fraction of the key space of all possible IP addresses.) 

Bottom-fc and Poisson samples are defined through a random 
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Consistent shared-seed pps ranks: 



keys: 1 , . . . ,6 
Instances: 1,2,3 



Insiance/key 
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Example functions / 
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10 
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RG(lii, V2, v^) 


10 
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key: 
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0.22 
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0.55 


0.37 


ri 
^2 
^3 


0.0147 

0.011 
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0.05 


0.007 

0.0583 

0.0047 


0.184 
0.046 


0.055 

+ 00 

0.0367 


0.037 
0.037 
0.037 


[ndependent pps ranks: 


key: 
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5 


6 


ui 
^1 


0.22 
0.0147 


0.75 

+ 00 


0.07 
0.007 


0.92 
0.184 


0.55 
0.055 


0.37 
0.037 


U2 
^2 


0.47 
0.0235 


0.58 
0.058 


0.71 
0.0592 


0.84 
0.042 


0.25 

+ 00 


0.32 
0.032 


""3 
'^3 


0.63 
0.063 


0.92 
0.0613 


0.08 
0.0053 


0.59 

+ 00 


0.32 
0.0213 


0.80 
0.08 



samples {shared seed): 
3. 1, 6 
1. 6, 4 
3. 1, 5 



bouom-3 samples (independent): 
1: 3.1,6 
2: 1.6,4 
3: 3.5,2 



(C) 



Figure 5: (A): Example data set with keys K — {1, ... ,6} and instances {1, 2, 3}. (B): per-key values for example aggregates. (C): 
random rank assignments and corresponding 3-order samples. 



rank assignment f36''9','12"22"13"141 r which maps keys to ranks. 
Rank values of different keys are independent. For each key h, the 
dependence of the rank value r{h) on the weight v{h) is captured 
by a family of probability density functions (w > 0): The rank 
r{h) is drawn from f„(h). 

• A Poisson sample is specified by a threshold r or an expected 

sample size k where k = Vv(h)ij) and F„ is the CDF 
of f„. The sample is the set of keys with r{h) < r. Since 
ranks of different keys are independent, so are inclusions in 
the sample. 

• A bottom-k sample contains the k keys of smallest rank. 

We can decouple the dependency of the rank onv(h) from its de- 
pendency on the randomization: Each key obtains (independently) 
a random seed value u{h) £ [/[0, 1]. The rank is then determined 
by the seed u{h) and the value v{h) to be r{h) F~j^j (it(/i)). 
Two families f„, that are used for weighted sampling. 

• EXP ranks: fw{x) — loe""'^ {V^{x) — l—e"""") are exponentially- 

distributed with parameter w (denoted by EXP[w]). Equiva- 
lently, if u £ [/[0, 1] then — \n{u)/w is an exponential ran- 
dom variable with parameter w. EXPfiu] ranks have the use- 
ful property that the minimum rank over a subpopulation K' 
has distribution EXP[u(A'')], where v{K') = J2heK' ^'(h)- 
A bottom-fc sample is equivalent to taking k weighted sam- 
ples without replacement, where at each step a key is selected 
with probability equal to the ratio of v{h) and the total value 
of the remaining keys ^^j36]|9][TT]|23][l2][T3). 

• PPS ranks: is the uniform distribution U[0, 1/w] (Fw{x) — 

min{l, wx}). This is the equivalent to choosing rank value 
u/w, where u G U[0, 1]. The Poisson-r sample is a PPS 
sample (28| (Inclusion Probability Proportional to Size). The 
bottom-fc sample is a priority sample |33][22| (PRI). 

Poisson sampling has the disadvantage that actual sample size 
varies. Bottom-fc sampling has fixed sample size but the depen- 
dence between keys complicates the design of the estimators and 
their analysis. VarOpt samples |10[ [^, which we do not de- 
fine here, have PPS inclusion probabilities and a fixed sample size. 
In VarOpt samples inclusion probabilities of different keys have 
nonpositive correlations which improves estimation quality. It is 
not clear, however, if we can incorporate "known seeds" into VarOpt 
sampling. 

Bottom-fc, Poisson, and VarOpt sampling are efficiently im- 
plemented on a data stream. Poisson sampling, where inclusions 
of different keys are independent, is applicable even when sam- 
pling of different keys must be completely decoupled (such as with 
transmitting sampled sensor measurements). 



Estimators 

We estimate sum aggregates using linear estimators of the form 
X^hgA'' fW- estimate f{h) is assigned to each key such that 
positive estimates are assigned only to keys included in the sample 
and estimates of other keys are 0. It follows that the estimate of 
the sum aggregate over K' is equal to the sum of the individual 
estimates of keys included in the sample S: X^hsifns fW- From 
linearity of expectation, when the estimates f{h) are unbiased, so 
is the estimate of the sum. 

The HT estimator, which assigns inverse-probability weights to 
sampled keys, is applicable to Poisson and VarOpt samples, where 
inclusion probabilities are available. With bottom-fc samples, the 
inclusion probability of a key depends on the weight distribution of 
all other keys. Tight unbiased estimators for subpopulation queries 
over bottom-fc samples were proposed only recently 1 22 , 38] [12] 
13 1. The main insight was a delicate application of HT: the es- 
timate for each key was obtained by applying inverse-probability 
weighting under the conditioning that the rank values of all other 
keys were fixed. This method, which we termed rank conditioning 
(RC) facilitated treating bottom-fc samples like independent sam- 
ples for the purposes of estimator design. While clearly, condi- 
tioning increases variance with respect to the unattainable HT esti- 
mates, it turns out that performance loss is minimal |38|. 

7.2 Multiple Instances 

Dispersed multiple instances are summarized independently, and 
therefore, for each key, the sampling of each entry Vi (h) of the data 
vector v(/i) is independent of the values of other entries. Sam- 
ples of different instances can be independent, which is a model we 
studied here in more detail, but can also be coordinated. 

Coordinated sampling: Estimation of many multi-instance func- 
tions, including quantile and difference queries, can be significantly 
improved by coordinating the sampling of different instances. Co- 
ordination means that a key that is sampled in one instance is more 
likely to be sampled in other instances: similar instances yield sim- 
ilar samples. A particular form of sample coordination, the PRN 
(Permanent Random Numbers) method, was used in survey sam- 
pling for almost four decades for Poisson 1 3 1 and order (37[|34|[36) 
samples. 

Coordination was (re-)introduced in computer science (5] |4| |9] 
[25j[26]|2][T3]|27][l4][lT|to address challenges of massive data 
sets and to facilitate tighter estimates of aggregates over multiple 
instances. Initially, for 0/1 values (where instances are sets and 
sum aggregates of multi-instance functions correspond to set oper- 
ations) and recently 1 17| for weighted data. 

A particular form of coordination, applicable to bottom-fc and 
Poisson samples is through consistent ranks. Rank assignments 
r; of different instances are consistent if for each key h, Vi{h) > 
Vj{h) => ri{h) < rj(h) (in particular, if entries are equal then so 
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are the ranks.) One can get consistent ranks by sharing the seed 
u{h) £ U[0, 1] for a particular key h across instances (17). This is 
easily achieved if seeds are determined by random hash functions. 
For each instance i and key h, we assign the rank value Vi (h) 
F~^j^j(u(/i)). For PPS ranks, ri{h) = u{h)/vi{h) and for EXP 
ranks, is ri(h) = — ln(l — u{h))/vi{h). 

On decomposable queries, however, coordination results in larger 
variance than independent samples, due to strong positive corre- 
lation between samples of different instances. Thus, independent 
sampling is preferable when the query workload is dominated by 
decomposable queries. Coordination also results in unbalanced 
burden - the same keys are consistently sampled. This is a neg- 
ative when sampling is used to limit transmissions to save sensor 
battery power. 

Knowledge of random seeds. We can get better estimates if we 
know the random seeds when we compute the estimator. This is 
possible (without the overhead of incorporating them in the sam- 
ple) with random rank based weighted sampling, if the seed Ui{h) 
for instance i are determined by a random hash function of the key 
h. The knowledge of the seed allows the estimator to obtain some 
information (upper bound) on the value Vi{h) even if it is not sam- 
pled. For example in Poisson sampling we know that Vi (h) must 
satisfy that r < F~^^^^{ui{h)). Since for weighted sampling we 
have that dominates F^i for w > w' this gives an upper bound 
on Vi{h). With bottom-fc sample, we define r to be the (fc + l)st 
smallest rank in K and also obtain a similar upper bound when 
Vi{h) is not sampled. 

With coordinated sampling, Ui{h) must be hash generated and 
reproducible, since they are available for summarization of differ- 
ent instances. With independent sampling of instances, implemen- 
tations may also use reproducible seeds. We show that knowledge 
of seeds enhances estimation scope and accuracy of some multi- 
instance functions. 

8. APPLICATIONS TO SUM AGGREGATES 
8.1 Distinct count 

Consider two instances with binary values. Each instance can 
be viewed as a subset of all possible key values K, including all 
keys that have value 1. We are interested in the size of the union 
of the two sets, that is, the number of distinct keys that occur in at 
least one instance. The distinct count is a sum aggregate with the 
function OR{vi(h), V2{h)). 

Suppose sampling of instances is independent with known seeds: 
The sampling of each instance can be Poisson or bottom-fc but the 
random seeds used are independent across instances. We estimate 
the sum by applying the estimators of Section [TT] to each key, and 
summing the estimates. As a side note, recall that more accurate 
estimates are possible by coordinating the samples of different in- 
stances, but coordination may not be possible or desirable in some 
situations. Also recall that we show in Section [6] that if seeds are 
not known, there is no nonnegative unbiased estimator. 

As a motivating application consider two periodic logs of re- 
source requests. Each time period (instance) i — 1,2 has a set 
Ni of active resources (say, resources requested at least once). The 
set Ni is then summarized via Poisson or bottom-fc sampling using 
random seeds Ui{h) and sampling probability Pi, to obtain a sam- 
ple Si. For Poisson sampling, for all h £ N we have h £ Si 
Ui{h) < Pi. For bottom-A; sampling. Si includes the k keys in A''^ 
with smallest Ui(h) values. In this case, we use the (fc + l)st small- 
est Ui{h) for Pi. The random hash functions are such that Ui{h) are 
independent for i = 1,2. 



From the samples 5*1 and ^2, and having access to m and pi, we 
want to estimate Da = KA^i U A*'2) n A\, the number of distinct 
keys in A^'i and A^2 that satisfy some selection rule A. 

To apply the estimators in Section [STTj we first categorize sam- 
pled keys according to the information we have on their member- 
ship in A'^i and ^^2. 

heFr; <^ /i G 5i A ii2 (/i) > P2 
heFn <^ h e S2 A ui(h) > pi 

he Fii ^ /i G Si n 52 

heFio <S=> h e Si A U2{h) < p2 
heFoi <S=> h e S2 A ui{h) < Pi 

The HT estimate and variance are 

~(HT) _ |A n (_Fii U Fio U Foi)| 



Da' 



P1P2 



var[D 



-(HT). 



= \Da\ 



1 

P1P2 



The L estimate is 

(L) 

Da = 



An (Fi7 U Fn U Fii) 



A n Fio 



P1+P2- P1P2 



A n Fo 



Pl(pi +P2 — P1P2) 



+ 



+- 



P2(pi +P2 — P1P2) 

The variance is 



VAR[_D. 



JaVAr[0r''^'1(1, 1)] + \Da\{1 - Ja)var[Or''^' 1(1,0)] , 



where J a 



|jVinJV2nA 



is the Jaccard coefficient. 



(]ViU]V2)nA I 

We assume in the rest of this section that pi = P2 ~ p and 
that we want to estimate the size of the entire union, that is A = 
Ni U 7V2. We also assume that |A'^i| — \N2\ = n. Figure [6] shows 
the sample size s (which is proportional to the sampling probability 
p) as a function of n, for the HT and L estimators. We show this 
dependency for fixed values of the Jaccard coefficient (denoted J 
in the figure), and the coefficient of variation (cv - ratio of standard 
deviation and the mean). 

The L estimators allows us to use a smaller sample size (factor 
of two). When we know that J is above a certain value, we can 
obtain tighter confidence intervals. 

Asymptotically, for small sampling probability p, if A'^ = | A'^i U 
A'^2 1 the variance of the HT estimator is N/p^ and its coefficient 
of variations is l/(p\/iV), meaning that we need to have p 3> 
1 / v'iV for meaningful estimates. The variance of the L estimator 
is ^^^p2^ + ^ . If p < ^"fj- , the coefficient of variation is about 



Vl — J/{2p\/N), meaning that we need a factor of \/l — J/2 
fewer samples than the HT estimator for the same accuracy. If 
p > ' the coefficient of variation is about ^ J/{2pN), mean- 
ing that 0(1) samples suffice for any fixed coefficient of variations. 

8.2 Max dominance 

We demonstrate performance of our estimators on a data sets 
which includes hourly summaries of IP traffic, in the form of desti- 
nation IP address and number of active IP flows to that destination. 

Figure^ shows performance of max'^-* and max'^"^^ estima- 
tors. Instances were from two consecutive hours, each with about 
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Figure 6: Sample size s as a function of the input size n (top) and ratio of sample sizes wlien using the L and HT estimators (bottom) 
required to achieve certain accuracy (measured by cv). 
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2.45 X 10* distinct destination IP addresses, with a total of 3.8 x 10* 
distinct destinations. The total number of flows in each hour was 
5.5 X 10^ and the sum of the maximum values was 7.47 x 10^. 
The figure shows the normalized variance 

VAR[5]max] _ 5]VAR[max] 

as a function of percentage of sampled keys. The sampling method 
applied to each instance was PPS Poisson (results are same for 
priority sampling) and instances were sampled independently but 
with known seeds. The estimator max*-^"^^ is monotone but not 
dominant. The estimator max^^' is monotone and dominant. The 
ratio between the variances of the two estimators on this data set 
VAR[J] max'^^']/VAR[5^ max*^'] varied between 2.45 to 2.7. 
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0.1 : 

i 0.01 

> 
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Figure 7: Variance (normalized) for estimating max dominance 
using the HT and L estimators over two independently-sampled 
instances with known seeds (Poisson PPS or priority sampling). 



Related work 

A related and well studied model, not mentioned in the body of the 
paper, is where data appears as a stream of keys and values, over 
which we want to estimate frequency moments and Lp norms |24[ 
[T||30|, aiming for query-specific space and time efficient algorithm. 
Our setup is fundamentally different as the input is a sample base 
summary of the data and the aim is to design good estimators for 
different queries. 

Conclusion 

Our work laid the foundations for deriving optimal estimators for 
queries spanning multiple sampled instances. We demonstrated 
significant improvements over existing estimators for example queries 
over common sampling schemes. In follow up work, we derive es- 
timators when samples of different instances are coordinated and 
derive Lp distance estimators over independent and coordinated 
samples. In the longer run, we hope that sometimes tedious deriva- 
tions of estimators can be replaced by automated tools. 
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APPENDIX 

A. max^^^ FOR INDEPENDENT WEIGHTED 
SAMPLES WITH KNOWN SEEDS 

The minimum element of -< is 0, and hence is the determining 
vector of all outcomes consistent with 0, which are all outcomes 
with S — Hence, on empty outcomes, max'^'(S) = 0. We 
next process vectors with two equal entries The outcomes 

determined by (i;, v) are: S = {1, 2} and vi = V2 = v, S — {1} 
«i = V, and U2 > vi/t2, oi S = {2}, V2 = v, and ui > 
V2/T1 . That is, outcomes where both entries are sampled and have 
the same value v or when exactly one entry is sampled, its value is 
V, and the upper bound on the value of the other entry is at least v. 
The probability of an outcome determined by {v, v) for data {v, v) 
is min{l, pr} + (1 — min{l, pr}) min{l, pr}. The estimate is 

therefore 



(v, v) ■■ 



min{l, + (1 - min{l, :^}) min{l, ^} 



(25) 



It remains to define the estimator on outcomes that are consistent 
with data vectors with two different valued entries and not consis- 
tent with data vectors with two identical entries: When IS] = 2 and 
vi 7^ V2, when S — {1} and U2T2 < vi or when S = {2} and 
uiTi < V2. We formulate a system of equations relating the esti- 
mate value for determining vectors of the form (u, ?; — A) (A > 0) 
to the estimate value on determining vectors of the same form and 
smaller values of A. The case of determining vectors of the form 
{v — A, v) is symmetric. 

case: u — A > r|: Outcomes consistent with (v,v — A) are 
5 = {1, 2}, in which case the determining vector is (v, v — A), or 
S = {2} and u\Ti > ii > u — A, in which case the determining 
vector is (i; — A, u — A). The probability of S = {1,2} when the 
data is {y, i; — A) is min{l, pr}. The probability of S = {2} is 
1 — miii{l, pr}. From Line|13| 

V — max'-^-' (ti, w — A) min{l, — } + 

max<^'(w -A,v- A)(l - min{l, ^}) . 



Using l|25|, mkx'-^\v - A 
solving for max'^-* {v, u — A) we obtain 



A) 



A: Substituting and 



{v,v~ A) 



A + 



A 



(26) 



case: v > t*: Outcomes consistent with data {v,v — A) have S = 
{1, 2} 01 S = {1}. Outcomes with S = {1, 2} have determining 
vector [v, i; — A) and probability min{l, ^^pi^}. Outcomes with 
S = {1} and U2T2 > V have determining vector (11, v), estimate 
value V (using 1 25 1), and probability (1 — min{l, pr})- Outcomes 

with 5 = {1} and v — A < U2T2 < v have determining vector 

(u,it2r|) , and probability '"'"l'^2 -^i — v+a.^ 

''2 

The equation in Line|13|is 



max^^'(t;,t;)(l - min{l, — }) + 
-^2 

1 rinin{v,T^} 

— / max*- '{v,y)dy + 

'^2 -Iv — A 

t) — A ,T\ 

minjl, jmax'' -'(11,11- A). 
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Substituting max'^^^ ii','") = we obtain that max'^' (v, y) 
V for all < J/ < u is a solution. 



case: rl > t; — A, ri > v 



V V — A 



V = max^ '(v,v~A) h 

max'^-* {v,v)^{l — min{l, -^}) + 



(27) 



V ,v — A 



(v-A,v-A){l-^)"-^ + 



y I /■min{i>,T2} 



'1 '2 Jv-A 



The first term is for outcomes with S — {1,2}. The determining 
vector is {v,v — A) and the probability given data vector (u, ii — A) 
is ^ "".'^ ■ The second is when S = {1} and U2T2 > v, that is, 
the upper bound on 112 is at least v. The determining vector of these 
outcomes is (v, v). The third is when S — {2} and Mirj* > v, that 
is, the upper bound on the first entry is at least v. The determining 
vector of these outcomes is — A, w — A). The fourth is when 
5 = {1} and the upper bound on the second entry is y £ [v — 
A, min{i;, r2 }]. The determining vector is v, y. The second term 
is zero if r2 < v. 

We solve separately for two subcases, 
subcase tI,t2 > v. We simplify \21\ 



V V — A 



V = max^ '{v,v — A) h 

max^ '{v,v) — {l ^) + 



V - A 



max^^'(u - A,v- A)(l -) — h 



'1 '2 Jv-A 



i^^\v,y)dy 



We apply {15\ to obtain: 



>C [v,V) = 



r * + Tj* - V 



■i''^\v- A,v- A) 



+ r| — u + A 



Substituting, we obtain 



V — max^ '(VjV — A) — - — 



r. To y ,T2 — V . 

-( , ) + 



+ 



Ti rl Ti ~ V V — A 

r 1* + rl - t; + A r-^ 

— — - 1 mkx^'"\v,y)dy 

''"l'''"2* Jv-A 



+ 



w = max^ '(v,v — A) 



^(^2* - ^ (ri - d)(« - A) 
* + r * - f + A 



max*-^^ iv, y)dy 



'1 '2 Jjj-A 



We define for A > 2; > 0, g{x) = max^^'(ii,i; — x) and 



G{x) = J g{x)dx. Rewriting the above, we obtain 

/ . N f ft' ^ A) viro — v) 

Taking a partial derivative with respect to A 

= ^siA)viv-A)_^ 



(n* + T"! - I) + A)2 rj*T2 

dg{A) ^ riVa'K ~^^)(Tr +r2* 
dA 



v{t^ ~V + A)2(tl - A) 

We use 51(0) — ^J^^!_^ and the derivative to determine g[A) 
max<^^(i',t; - A): 

max'"''-' (w, V — A) = 

= 5(A) = ff(0) + r ^da; 



T1T2 



+ f"2* 
nA 



+ 



dx 
Ti r2*(ri - u)(ri + r2*) 



(ti* +r2* -?; + a;)2(i;-a;) 



dx 



Integrating we obtain 



(28) 



1 



In 



(tj* + r2* — ?; + A)u 



+ 



^ A 

(n* + T-2*)(Tr + Ta* - -y + A)(r* + Tj* - u) 

Substituting: 

max'"^' (u, u — A) = 

+ 



(29) 



TlT-2 



_l_ rrr2''(Tr - «) / (ri + ra* - + A)v _^ 



+ 



Arir2*(ri - v) 



'"{'^1 + T-2* - u + A)(r* + r2* - u) 
subcase rj* > ii > rl > u — A: Simplifying i |27[ i 



max^ ^(n, n — A) 



max*-"^^ (n — A, ?; — A) 



(n* A) 



max*-^' (w, y)dy+ 



'1 '2 Jv-A 



^We change variables: 3/ = rj* + tI — w + a;. Then, « — s = rj* + 
— V- Integral becomes f^i "^"^ , ^ , — rdy. We use 
B = n + T2. We have (in the range y G (0, B)): / ^^Tfj^dy = 
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We substitute, using ([25j: The expressions stated in tlie table in Figure |3] are from ( |26[ l 

([29ll, and JSOl. 



max*-^) (« - A, u - A) = '^-^ 

Ti* + To* - u + A 



+ 



We obtain 

V = max'^^ {v, V - A) ^^^ ^ + 

t^t; K- ?;)(«- A) 

r* + rl - + A r*r2* 

max'^' {v, y)dy+ 

'l''2 Ju-A 

Simplifying, and using g(x) = max'^' (w, u — a;) and G(a;) = 
Jgix)dx 



'ix: 
V — 



A) K A) 

r'rj r'+r2*-t; + A 

^(G(A)-G(i.-r2)) + 
We taking a partial derivative with respect to A: 

= g (A) ^ , , ^ - -^g(A) + 

^1 ^2 "^1 f"2 



(r* + r* - + A)2 Ti*r, 
Simplifying, 

^/^^^ ^ (^1*^-2* )(rr - i;)(ri + r^) 



(ti* + rl - u + A)2u(u - A) 
Thus 

/■A 

»(A) = g{v - ra ) + / g'{x)dx . 



Using 1 26 1, — r2 ) — raajc ' {v,T2) — Tj +r2 Hence, 



max'"'"' (ii, t; — A) = 
* * 

• I • T1T2 , 

Tl +T-2 1- 

V 

{Tir2)iTi - u)(ri* + T2) 



v-T' (n* +r^~v + xY{v - x) 



dx 



Using ([281, 

1 

-dx 



r - 



+ 



+ r| — n + x)^{v — x) 
(r* + r2*)2 V "^2 K + f"2 -■") J 

T2 -V + A 



i^i + ^2 )(-ri* + T* - v + A)r* 
max'^'(t;,i; - A) = (30) 

= Tl +T-2 h 

_^ (Trr2*)(ri - ■») / (ti* + r2* - u + A)ri 
''K + f"2*) V f"2*(^r + ^2* - 

_^ T-2*(Tr - v){t2 -V + A) 

(rj* + r2* — t; + A)v 
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